Skip to content

Conversation

@chraac
Copy link
Contributor

@chraac chraac commented Nov 23, 2025

In PR #17344, we fixed some swiglu/silu ops but left hvx_fast_sigmoid_f32 unaddressed. This PR completes the fix by resolving the remaining failed swiglu/silu ops.

Changes

  • Fix hvx_fast_sigmoid_f32 implementation to resolve remaining NaN and accuracy issues

Before

[SILU] NaN at index 231 (HTP0=-nan CPU=88.449669)   SILU(type=f32,ne_a=[128,2,2,2],v=0): �[1;31mFAIL�[0m
  SILU(type=f32,ne_a=[5,7,11,13],v=0): �[1;32mOK�[0m
[SWIGLU] NaN at index 122 (HTP0=nan CPU=-11446.431641)   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=0): �[1;31mFAIL�[0m
  SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=0): �[1;32mOK�[0m
[SWIGLU] NMSE = 3.835742624 > 0.000000100   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=1): �[1;31mFAIL�[0m
  SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=1): �[1;32mOK�[0m
[SWIGLU] NaN at index 216 (HTP0=nan CPU=-8444.154297)   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,split): �[1;31mFAIL�[0m
  SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,split): �[1;32mOK�[0m

After

  SILU(type=f32,ne_a=[128,2,2,2],v=0): �[1;32mOK�[0m
  SILU(type=f32,ne_a=[5,7,11,13],v=0): �[1;32mOK�[0m
  SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=0): �[1;32mOK�[0m
  SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=0): �[1;32mOK�[0m
  SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=1): �[1;32mOK�[0m
  SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=1): �[1;32mOK�[0m
  SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,split): �[1;32mOK�[0m
  SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,split): �[1;32mOK�[0m

@chraac chraac marked this pull request as draft November 23, 2025 16:02
@chraac chraac changed the title [WIP]ggml-hexagon: fix swiglu failure at test-backend-ops part2 ggml-hexagon: fix swiglu failure at test-backend-ops part2 Nov 23, 2025
@chraac chraac marked this pull request as ready for review November 23, 2025 16:06
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 23, 2025
@max-krasnyansky
Copy link
Collaborator

In PR #17344, we fixed some swiglu/silu ops but left hvx_fast_sigmoid_f32 unaddressed. This PR completes the fix by resolving the remaining failed swiglu/silu ops.

Changes

  • Fix hvx_fast_sigmoid_f32 implementation to resolve remaining NaN and accuracy issues

@chraac
Looks like these changes already showed in #17212
I just tested and merged that and we're now passing all SWIGLU and SILU tests.
I'll close this one as duplicate. Please reopen if I got that wrong.

Excellent work! Thanks for updates!

@chraac
Copy link
Contributor Author

chraac commented Nov 24, 2025

I'll close this one as duplicate. Please reopen if I got that wrong.

Np, Let's close this PR and merge the other one instead. This PR was created specifically to merge the swiglu fix without including the buffer rework changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants